197 research outputs found

    Sampling-based optimization with mixtures

    No full text
    Sampling-based Evolutionary Algorithms (EA) are of great use when dealing with a highly non-convex and/or noisy optimization task, which is the kind of task we often have to solve in Machine Learning. Two derivative-free examples of such methods are Estimation of Distribution Algorithms (EDA) and techniques based on the Cross-Entropy Method (CEM). One of the main problems these algorithms have to solve is finding a good surrogate model for the normalized target function, that is, a model which has sufficient complexity to fit this target function, but which keeps the computations simple enough. Gaussian mixture models have been applied in practice with great success, but most of these approaches lacked a solid theoretical founding. In this paper we describe a sound mathematical justification for Gaussian mixture surrogate models, more precisely we propose a proper derivation of an EDA/CEM algorithm with mixture updates using Expectation Maximization techniques. It will appear that this algorithm resembles the recent Population MCMC schemes, thus reinforcing the link between Monte- Carlo integration methods and sampling-based optimization. We will concentrate throughout this paper on continuous optimization

    On two ways to use determinantal point processes for Monte Carlo integration -- Long version

    Get PDF
    International audienceWhen approximating an integral by a weighted sum of function evaluations, determinantal point processes (DPPs) provide a way to enforce repulsion between the evaluation points. This negative dependence is encoded by a kernel. Fifteen years before the discovery of DPPs, Ermakov & Zolotukhin (EZ, 1960) had the intuition of sampling a DPP and solving a linear system to compute an unbiased Monte Carlo estimator of the integral. In the absence of DPP machinery to derive an efficient sampler and analyze their estimator, the idea of Monte Carlo integration with DPPs was stored in the cellar of numerical integration. Recently, Bardenet & Hardy (BH, 2019) came up with a more natural estimator with a fast central limit theorem (CLT). In this paper, we first take the EZ estimator out of the cellar, and an- alyze it using modern arguments. Second, we provide an efficient implementation1 to sample exactly a particular multidimensional DPP called multivariate Jacobi ensemble. The latter satisfies the assumptions of the aforementioned CLT. Third, our new implementation lets us investigate the behavior of the two unbiased Monte Carlo estimators in yet unexplored regimes. We demonstrate experimentally good properties when the kernel is adapted to basis of functions in which the integrand is sparse or has fast-decaying coefficients. If such a basis and the level of sparsity are known (e.g., we integrate a linear combination of kernel eigenfunctions), the EZ estimator can be the right choice, but otherwise it can display an erratic behavior

    On two ways to use determinantal point processes for Monte Carlo integration

    Get PDF
    International audienceThis paper focuses on Monte Carlo integration with determinantal point processes (DPPs) which enforce negative dependence between quadrature nodes. We survey the properties of two unbiased Monte Carlo estimators of the integral of interest: a direct one proposed by Bardenet & Hardy (2016) and a less obvious 60-year-old estimator by Ermakov & Zolotukhin (1960) that actually also relies on DPPs. We provide an efficient implementation to sample exactly a particular multidimen-sional DPP called multivariate Jacobi ensemble. This let us investigate the behavior of both estima-tors on toy problems in yet unexplored regimes

    Algorithms for Hyper-Parameter Optimization

    Get PDF
    International audienceSeveral recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel ap- proaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it pos- sible to run more trials and we show that algorithmic approaches can find better results. We present hyper-parameter optimization results on tasks of training neu- ral networks and deep belief networks (DBNs). We optimize hyper-parameters using random search and two new greedy sequential methods based on the ex- pected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreli- able for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements

    Collaborative hyperparameter tuning

    No full text
    International audienceHyperparameter learning has traditionally been a manual task because of the limited number of trials. Today's computing infrastructures allow bigger evaluation budgets, thus opening the way for algorithmic approaches. Recently, surrogate-based optimization was successfully applied to hyperparameter learning for deep belief networks and to WEKA classifiers. The methods combined brute force computational power with model building about the behavior of the error function in the hyperparameter space, and they could significantly improve on manual hyperparameter tuning. What may make experienced practitioners even better at hyperparameter optimization is their ability to generalize across similar learning problems. In this paper, we propose a generic method to incorporate knowledge from previous experiments when simultaneously tuning a learning algorithm on new problems at hand. To this end, we combine surrogate-based ranking and optimization techniques for surrogate-based collaborative tuning (SCoT). We demonstrate SCoT in two experiments where it outperforms standard tuning techniques and single-problem surrogate-based optimization

    A correspondence between zeros of time-frequency transforms and Gaussian analytic functions

    Get PDF
    International audienceIn this paper, we survey our joint work on the point processes formed by the zeros of time-frequency transforms of Gaussian white noises [1], [2]. Unlike both references, we present the work from the bottom up, stating results in the order they came to us and commenting what we were trying to achieve. The route to our more general results in [2] was a sort of ping pong game between signal processing, harmonic analysis, and probability. We hope that narrating this game gives additional insight into the more technical aspects of the two references. We conclude with a number of open problems that we believe are relevant to the SampTA community

    On two ways to use determinantal point processes for Monte Carlo integration

    Get PDF
    International audienceThis paper focuses on Monte Carlo integration with determinantal point processes (DPPs) which enforce negative dependence between quadrature nodes. We survey the properties of two unbiased Monte Carlo estimators of the integral of interest: a direct one proposed by Bardenet & Hardy (2016) and a less obvious 60-year-old estimator by Ermakov & Zolotukhin (1960) that actually also relies on DPPs. We provide an efficient implementation to sample exactly a particular multidimen-sional DPP called multivariate Jacobi ensemble. This let us investigate the behavior of both estima-tors on toy problems in yet unexplored regimes

    Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

    Get PDF
    Despite having various attractive qualities such as high prediction accuracy and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix Factorization has not been widely adopted because of the prohibitive cost of inference. In this paper, we propose a scalable distributed Bayesian matrix factorization algorithm using stochastic gradient MCMC. Our algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent. In our experiments, we show that our algorithm can achieve the same level of prediction accuracy as Gibbs sampling an order of magnitude faster. We also show that our method reduces the prediction error as fast as distributed stochastic gradient descent, achieving a 4.1% improvement in RMSE for the Netflix dataset and an 1.8% for the Yahoo music dataset

    Hierarchical Bayesian inference for ion channel screening dose-response data

    Get PDF
    Dose-response (or 'concentration-effect') relationships commonly occur in biological and pharmacological systems and are well characterised by Hill curves. These curves are described by an equation with two parameters: the inhibitory concentration 50% (IC50); and the Hill coefficient. Typically just the 'best fit' parameter values are reported in the literature. Here we introduce a Python-based software tool, PyHillFit , and describe the underlying Bayesian inference methods that it uses, to infer probability distributions for these parameters as well as the level of experimental observation noise. The tool also allows for hierarchical fitting, characterising the effect of inter-experiment variability. We demonstrate the use of the tool on a recently published dataset on multiple ion channel inhibition by multiple drug compounds. We compare the maximum likelihood, Bayesian and hierarchical Bayesian approaches. We then show how uncertainty in dose-response inputs can be characterised and propagated into a cardiac action potential simulation to give a probability distribution on model outputs
    corecore